2A: Vector

Readings

From R Coding Basics: An Introduction to the Basics of Coding in R by Dr. Gaston Sanchez:

Topics

  • Vectors

  • Atomic types

  • Special values

  • Creating vectors with c(), :, seq() , and rep()

  • Useful functions for numeric vectors

  • Built-in vectors

Basic data structures

  • Basic data structures in R include vector, factor, matrix, array, data frame, and list.

  • These structures are characterized by their dimension and whether they require all elements to be of the same atomic type.
Structure Dimension Same Atomic Type
Vector 1 Yes
Factor 1 Yes
Matrix 2 Yes
Data Frame 2 No
Array \(\ge\) 2 Yes
List 1 No

Vector

  • A vector is a sequence of elements that are of the same atomic type.

  • In R, the index of the first element is always 1.

#  1   2   3   4   5   6   7
c(70, 66, 82, 85, 78, 90, 73)
  • A single value is treated as a vector of one element
70    # same as c(70)

Atomic types

  • An atomic type refers to the six fundamental types in R: logical, integer, double, character, raw, and complex.

  • Note that integer and double are also known as numeric.

# logical vector
c(TRUE, FALSE, FALSE, TRUE, TRUE)
c(T, F, F, T, T)

# integer vector (numeric)
c(1L, 3L, 2L, 4L, 2L)

# double vector (numeric)
c(6.3, 8.2, 3.1, 4.4, 7.6)

# character vector
c("apple", "orange", "apple", "apple", "orange")  
c('dog', 'cat', 'dog', 'dog', 'cat')  

# raw and complex exist but not very popular
  • The functions typeof() and storage.mode() tells us the atomic type of a vector.

  • The function mode() works similarly, except that it returns numeric for both integer and double.

typeof(2.3)
[1] "double"
storage.mode(2.3)
[1] "double"
mode(2.3)
[1] "numeric"
  • There are also dedicated checking function for each atomic type.
is.logical()

is.integer()    # integer but not double
is.double()     # double but not integer
is.numeric()    # either integer or double

is.character()

💻 Hands-On

Try the following R code to see what it returns.

enrollment <- c(10, 30, 15, 20)

typeof(enrollment)

storage.mode(enrollment)

mode(enrollment)

is.integer(enrollment)

is.double(enrollment)

is.numeric(enrollment)   

Note that enrollment contains whole numbers but without writing c(10L, 30L, 15L, 20L), R still thinks of them as double.

enrollment <- c(10, 30, 15, 20)

typeof(enrollment)
[1] "double"
storage.mode(enrollment)
[1] "double"
mode(enrollment)
[1] "numeric"
is.integer(enrollment)
[1] FALSE
is.double(enrollment)
[1] TRUE
is.numeric(enrollment)  
[1] TRUE

Special values

  • NULL indicates an undefined object

  • NA indicates missing or “not available” value

  • NaN indicates an object that is “not a number”

  • Inf indicates positive infinite

  • -Inf indicates negative infinite

💻 Hands-On

Try the following R code to see what it returns.

sqrt(-7)

log(-5)

0 / 0

100 / 0

-100 / 0

log(0)
sqrt(-7)
[1] NaN
log(-5)
[1] NaN
0 / 0
[1] NaN
100 / 0
[1] Inf
-100 / 0
[1] -Inf
log(0)
[1] -Inf

Creating vectors

  • As shown previously, a vector can be manually created using the combine c() function.
# logical vector
c(TRUE, FALSE, FALSE, TRUE, TRUE)
c(T, F, F, T, T)

# integer vector (numeric)
c(1L, 3L, 2L, 4L, 2L)

# double vector (numeric)
c(6.3, 8.2, 3.1, 4.4, 7.6)

# character vector
c("apple", "orange", "apple", "apple", "orange")  
c('dog', 'cat', 'dog', 'dog', 'cat') 
  • Elements in a vector can have names!

  • We can give names directly in c()

c(exam1 = 90, exam2 = 85, final = 92)
exam1 exam2 final 
   90    85    92 
  • We can also create the vector and assign names later.
scores <- c(90, 85, 92)
names(scores) <- c('exam1', 'exam2', 'exam3')
scores
exam1 exam2 exam3 
   90    85    92 

💻 Hands-On

  • Use c() to create a short vector for each of the four atomic types: integer, double, logical, and character.

  • Assign each vector to a variable with a descriptive name.

  • Choose one of your vectors and assign names to its elements.

# integer
experience <- c(1L, 3L, 5L, 2L)

# double
weight <- c(143.5, 150, 127.3, 133.5)

# logical
in_stock <- c(TRUE, FALSE, FALSE, TRUE)

# character
student_levels <- c('Junior', 'Freshman', 'Junior', 'Senior')

Creating numeric vectors

The colon operator

  • The colon operator : generate a numeric sequence of one-unit steps by
### start:end (end is like an upper/lower bound)

-2:5       # start with -2, increase by 1
 
5:-2       # start with 5, decrease by 1

3.7:9.2    # start with 3.7, increase by 1

💻 Hands-On

Use the colon operator : to quickly create the following vectors

c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

c(17, 16, 15, 14, 13, 12, 11, 10, 9)

Since the vectors contain consecutive elements, the colon operator : is useful.

3:12   
 [1]  3  4  5  6  7  8  9 10 11 12
17:9   
[1] 17 16 15 14 13 12 11 10  9

The seq() function

  • The seq() function generates a numeric sequence of more general steps.
# step size of 2
seq(from = -2, to = 5, by = 2)            
[1] -2  0  2  4
# step size of 0.75
seq(from = -2, to = 5, by = 0.75)         
 [1] -2.00 -1.25 -0.50  0.25  1.00  1.75  2.50  3.25  4.00  4.75
# steps are automatically adjusted
seq(from = -2, to = 5, length.out = 6)    
[1] -2.0 -0.6  0.8  2.2  3.6  5.0

💻 Hands-On

Use the seq() function to create the vector that

  • Starts at 5 and ends at -3, increasing or decreasing by 1

  • Starts at -1 and ends at 7, with a step size of 1.5.

  • Starts at 0 and ends at 14, containing 5 equally spaced values.

  • Starts at -4 and ends at 4, including only every other number.

seq(from = 5, to = -3, by = -1)
[1]  5  4  3  2  1  0 -1 -2 -3
seq(from = -1, to = 7, by = 1.5)
[1] -1.0  0.5  2.0  3.5  5.0  6.5
seq(from = 0, to = 14, length.out = 5)
[1]  0.0  3.5  7.0 10.5 14.0
seq(from = -4, to = 4, by = 2)
[1] -4 -2  0  2  4

The rep() function

  • The rep() function creates vectors with repeated elements.
# repeat -1 five times
rep(-1, times = 5)                      
[1] -1 -1 -1 -1 -1
# repeat c(-1, 0, 3) four times
rep(c(-1, 0, 3), times = 4)             
 [1] -1  0  3 -1  0  3 -1  0  3 -1  0  3
# repeat -1 two times, 0 three times, 3 four times
rep(c(-1, 0, 3), times = c(2, 3, 4))    
[1] -1 -1  0  0  0  3  3  3  3
# repeat -1 five times, 0 five times, 3 five times
rep(c(-1, 0, 3), each = 5)              
 [1] -1 -1 -1 -1 -1  0  0  0  0  0  3  3  3  3  3

💻 Hands-On

Use the rep() function to create the vector in which

  • Each value in the vector c(1, 3, 6) is repeated exactly 4 times

  • The value 4 is repeated 6 times

  • The vector c(2, −1, 1) is repeated 3 times

  • The values in c(5, 0, −2) are repeated so that 5 appears once, 0 appears three times, and -2 appears four times.

Answer:

rep(c(1, 3, 6), each = 4)
 [1] 1 1 1 1 3 3 3 3 6 6 6 6
rep(4, times = 6)
[1] 4 4 4 4 4 4
rep(c(2, -1, 1), times = 3)
[1]  2 -1  1  2 -1  1  2 -1  1
rep(c(5, 0, -2), times = c(1, 3, 4))
[1]  5  0  0  0 -2 -2 -2 -2

Summary functions for numeric vectors

Consider a vector

#       1   2   3   4   5   6   7
v <- c(70, 66, 82, 85, 78, 90, 73)
  • length() returns its length

  • min() returns its minimum value

  • max() returns its maximum value

  • which.min() returns the index of its minimum value

  • which.max() returns the index of its maximum value

  • sum() returns the sum of its elements

  • prod() returns the product of its elements

length(v)
[1] 7
min(v)
[1] 66
which.min(v)
[1] 2
max(v)
[1] 90
which.max(v)
[1] 6
sum(v)
[1] 544
prod(v)
[1] 1.650193e+13

💻 Hands-On

Consider a vector containing calorie content of light beer brands. Write R code to answer the following questions:

  • How many light beer brands are included?

  • What is the lowest calorie content among the light beers?

  • What is the highest calorie content among the light beers?

  • At which position does the highest calorie value occur?

  • What is the total calorie content across all light beer brands?

# Calories per 100ml of light beer
beer_cals <- c(29, 28, 33, 31, 30, 33, 30, 28, 27, 41, 39, 31, 29, 
               23, 32, 31, 32, 19, 40, 22, 34, 31, 42, 35, 29, 43)
# Number of light beer brands
length(beer_cals)
[1] 26
# Lowest calorie content
min(beer_cals)
[1] 19
# Highest calorie content
max(beer_cals)
[1] 43
# Position of the highest calorie value
which.max(beer_cals)
[1] 26
# Total calorie content
sum(beer_cals)
[1] 822

💻 Hands-On

Try the following R code to see what it returns.

#       1   2   3   4   5   6   7
v <- c(70, 66, 82, 85, NA, 90, 73)

length(v)

min(v)

which.min(v)

max(v)

which.max(v)

sum(v)

prod(v)

Note that the vector v contains a missing value NA

#       1   2   3   4   5   6   7
v <- c(70, 66, 82, 85, NA, 90, 73)

Therefore, some functions return NA. The functions which.min() and which.max() exclude missing values first so they return some values.

length(v)
[1] 7
min(v)
[1] NA
which.min(v)
[1] 2
max(v)
[1] NA
which.max(v)
[1] 6
sum(v)
[1] NA
prod(v)
[1] NA

Built-in vectors

  • R includes several built-in vectors for alphabets, \(\pi\), months, and US states.

💻 Hands-On

Try the following R code and see what it returns. Feel free to get the help documentation.

LETTERS
letters

month.abb
month.name

pi

state.abb
state.name
state.area
LETTERS
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
month.abb
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 
pi
[1] 3.141593
state.abb
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA"
[16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ"
[31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT"
[46] "VA" "WA" "WV" "WI" "WY"
state.name
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"      
 [5] "California"     "Colorado"       "Connecticut"    "Delaware"      
 [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"         
[13] "Illinois"       "Indiana"        "Iowa"           "Kansas"        
[17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"      
[21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
[25] "Missouri"       "Montana"        "Nebraska"       "Nevada"        
[29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"      
[33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
[37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
[41] "South Dakota"   "Tennessee"      "Texas"          "Utah"          
[45] "Vermont"        "Virginia"       "Washington"     "West Virginia" 
[49] "Wisconsin"      "Wyoming"       
state.area
 [1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876
[11]   6450  83557  56400  36291  56290  82264  40395  48523  33215  10577
[21]   8257  58216  84068  47716  69686 147138  77227 110540   9304   7836
[31] 121666  49576  52586  70665  41222  69919  96981  45333   1214  31055
[41]  77047  42244 267339  84916   9609  40815  68192  24181  56154  97914